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MENTAL AGE NORMS FOR VOCABULARY SCORES 
IN THE 1937 STANFORD-BINET!* 


MariAN L. Wuirte, M. A. 
University of Maine* 


Terman (29) was the first American psychologist to include an 
extensive vocabulary test in a revision of the original Binet-Simon 
scale for measuring intelligence. In 1912 (31) when the tentative 
revision of the scale was made, the vocabulary test was included 
because of the facility with which such an item might be ad- 
ministered, the relative ease of establishing age norms, and the 
usefulness of the test for clinical purposes. The vocabulary test 
was also included in the 1916 scale. Terman then referred to it 
as having “a far higher value than any other single test of the 
scale” (29, p. 230). In order to answer certain criticisms that 
had been made about the test, Terman in 1918, (30), made a special 
study of the vocabulary and concluded that its reliability is high, that 
it does not necessarily measure a very special ability, that it is not 
especially dependent upon accident of environment and instruction, 
and that invalidation because of variations in personal scoring are 
negligible. 

Lincoln (16) in a later study verified the reliability reported by 
Terman. Freeman (9) and Jastak (13) both consider the item 
the most valuable one in the entire scale. Brandenburg (4), Good- 
enough (11), and Mahan and Witmer (21) all attest to the validity 
of the item. Louden (17) has shown that when mental age is held 
constanf the vocabulary score is not affected by variations in chron- 
ological age. 

The test has been found to be influenced by sex (29, 30, and 36), 
by socio-economic status (5), and by race (13, 23, and 28). With 
the exception of two studies (20, 23) bilingualism has also been 
shown to exert an influence (13, 15, 22, 25, 30, and 34). Investi- 

*The author wishes to express her sincere appreciation to Dr. A. D. 
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gators have found that the two separate lists of the test are un- 
equal in difficulty (18, 21, and 24), that some of the words are 
incorrectly placed (7, 8), and that the scoring standards for some 
of the vocabulary items are inaccurate (19, 21, 33). In spite of 
these limitations the test still remains as a valid part of the 1937 
scale (3, 6, 7, 12, and 32). 

The purpose of this study was to develop mental age norms for 
the vocabulary test items on the 1937 Terman and Merrill Revision 
of the Stanford-Binet Scale, Form L so that the clinician might 
have a more refined method of differentiating subjects on the basis 
of their vocabulary performance on this scale. That such is a use- 
ful clinical tool has been demonstrated by Altman and Shakow 
(1), Babcock (3), Jastak (13, 14), Simmons (26, 27), Wells 
and Kelley (35), and Whitman (37). 


PROCEDURE AND RESULTS 


The raw data were obtained from the records of 1937 Stanford- 
Binet tests administered to 1109 children in the public schools of 
Buffalo, New York; 230 children in the public schools of Orono, 
Maine; 29 children in the State School for Girls at Hallowell, Maine; 
and 48 kindergarten children in Bangor, Maine. In all, 1416 records 
were studied. All of the tests were given by trained examiners or by 
advanced students in psychology working under the direction of a 
trained psychologist. 

Preliminary investigation of the data indicated that there were 
very few cases with an I.Q. greater than 125. In order to assure 
as nearly a representative sampling of the population as possible from 
the limited data available, only those cases which fell within the 
I.Q. range of 75 to 125 were considered. No cases with known 
speech defects or marked emotional instability were included. How- 
ever, practical limitations made it impossible to control the factors of 
bilingualism, race, and socio-economic status. Because of the in- 
accessibility of data, the age range was limited from five years and 
six months to sixteen years and six months. Treatment of the 
original data showed a positively skewed curve of distribution. The 
mean I.Q. was 93.45 and there were nearly twice as many male 
cases as there were female. Because of the small number of cases 
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at the upper levels, the age range was lowered to 14-5, and those 
above this range were not further considered. 

In order to obtain equal sex representation and to make the distri- 
bution comparable to that of Terman and Merrill (32), 663 of the 
cases were eliminated. The percentage of cases at each of five 10- 
step intervals, i.e., 75-84, 85-94, 95-104, 105-114, 115-125, as given 
by Terman and Merrill (32) in their data for the standardization of 
the 1937 scale was determined. Because they give only a graphic 
representation of the percentages, this determination is at best only 
an approximation. Their graph shows that approximately 85% 
of their total number of cases fell within the 75-125 I.Q. range, the 
total range for the data in this study. It was necessary to adjust 
the percentages used by Terman and Merrill so that the five intervals 
would represent 100% rather than 85%. The revised per cents 
were found to be: 11.8 for the LQ. ranges 75-84 and 115-125; 23.5 
for 85-94 and 105-114; and 29.4 for 95-104. The number of cases 
for each of these percentages was computed. At the higher age 
levels there were so few cases in any given interval that both the 
retention of a number of cases that approached 100 and the attaining 
of a distribution exactly comparable to Terman and Merrill’s was 
impossible. The actual number of cases in each interval was adjusted 
to these percentages by the elimination of excess cases. 

Data to be excluded were chosen in a manner that would leave 
an approximately equal number from each sex and eliminate as many 
obviously foreign names as possible. After these requirements were 
met, there were still at some age levels excess cases which were dis- 
carded in a random manner. Table I shows for the corrected data 
the number of cases at each C. A. level, the number from each locality, 
the number of each sex, the average I. Q., and the average number 
of words passed. Figure 1 shows for the selected data the number 
of cases in each of the five 10-step I. Q. intervals. 

For the data retained on the basis of the above procedure, the aver- 
age I.Q. and the average number of words correctly defined at each 
age level were computed. The mid-points of the various C. A. ranges 
were plotted on the abscissa axis for Figure 2. Against these values 
were plotted the average number of words correctly defined. The 
norms presented in Table II were read by linear interpolation from this 
graph. Those credits beyond M. A. 11-6 are especially tentative be- 











162 MARIAN L. WHITE 
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cause of the particularly small number of cases at that and the fol- 
lowing age levels. 

The Pearson product moment coefficient of correlation computed 
between the number of words passed and the C. A. was found to be 
81+ .01. A coefficient of .86+ .01 was found between the number 
of words passed and the entire M. A. range. 


Discussion 


Because of the representative sampling of cases, ‘the control of sex 
differences, and the approximation to Terman’s findings (30, 32) the 
author believes these norms to be essentially valid. Those at the 
lower end of the scale are more valid because of the lower standard 
error of the means at these points and because inspection of the 
scatter diagram used in obtaining the coefficients of correlation showed 
greater variability from year X to the end of the scale. 

However, there are certain limitations imposed by the standardiza- 
tion of the scale. The first of these involves the dependence of the 
data upon the 1937 scale as a criterion of validity and for the selection 
of subjects. Because of the high degree of relationship found to 
exist between the C. A. and the number of words known by the 
subjects, this does not appear to be too serious. 

For this study the factors of race, bilingualism, socio-economic stat- 
us, geographical distribution of the subjects, and unequal rural-urban 
representation remain uncontrolled. variables that may or may not 
have effected a mutual cancellation. 
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TABLE II 


SHOWING M. A. EQUIVALENTS FOR 
VOCABULARY SCORES 

















Number of Words Mental Age’ 
a 6-3 
5.5 6-7 
6. 6-11 
6.5 7-4 
7 7-8 
ao 8-0 
8. 8-6 
8.5 9-0 
9. 9-4 
9.5 9-8 
10. 10-0 
10.5 10-4 
13. 10-7 
Ri.? 10-10 
ie. 11-2 
12.5 11-6” 
13. 11-10 
13.5 12-1 
14. 12-4 
14.5 12-7 
15. 12-9 
13.5 13-0 
16. 13-5 


16.5 3-10 





*Mental age, i.e., M. A. equivalents for the mean number of words passed 
at various C. A. levels. 

*Norms beyond this mental age are tentative because of the small number 
of cases. 
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Terman’s findings (30) suggest that the use of results obtained 
by many examiners would not invalidate the norms. 

The future use of half-credit standards is hesitantly recommended. 
Objections to their use depend upon the fact that these norms were 
secured from vocabulary responses that were scored as either passed 
or failed. The extent of the change that a half-credit method of 
scoring would have effected in the norms is at present undetermined. 
Half-credit may be given on the basis of the facts that instructions 
for this procedure are given by Terman (29) and that failure to give 
half-credit may make a difference in the vocabulary M. A. of 4 to 10 
months. 

The results here presented are to be considered refinements, not 
replacements, of the vocabulary norms in the 1937 Revision of the 
Stanford-Binet. They are presented as tentative credit standards to 
be used when there is desired a sharper differentiation of subjects on 
the basis of vocabulary than is allowed in the complete scale. 
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